---
sidebar_position: 1
title: "Grafast Introduction"
toc_max_heading_level: 4
---

import Mermaid from "@theme/Mermaid";
import mermaidPlan from "../examples/users-and-friends/plan-simplified.mermaid?raw";

# <Grafast /> introduction

:::info

This introduction to Gra*fast* assumes that you have a basic understanding of
GraphQL, including the concept of resolvers. We highly recommend that you read
[Introduction to GraphQL](https://graphql.org/learn/), and in particular
[GraphQL Execution](https://graphql.org/learn/execution/), before reading this
document.

:::

**Gra*fast* is a radical new approach to executing GraphQL requests.**

The GraphQL specification details execution via "resolvers": the value for each
field is determined by calling the relevant (potentially asynchronous) function,
passing the parent object and the field's arguments (if any). This localised
reasoning is simple to specify and enforces the "graph" nature of GraphQL (the
[value of a node is independent of the path through which it was
fetched](https://benjie.dev/graphql/traversal)), but its short-sighted approach
means the next layer of data requirements are discovered only after previous
layer has executed. Furthermore, each entry in a list is processed separately,
so for a list with N items, each descendent selection may produce N additional
fetches from the backend - this is called the N+1 problem. This quickly
multiplies up to be even more devastating in nested lists.

Most GraphQL servers follow the GraphQL specification execution algorithm
verbatim, recommending the [DataLoader][] pattern to avoid the N+1 cascade (but
resulting in an explosion of Promises instead) and doing little to address the
over-fetching and under-fetching problems that just-in-time requirements
discovery leads to.

&ZeroWidthSpace;<Grafast /> was designed from the ground up to eliminate these
issues and more whilst maintaining pleasant APIs for developers to use.

## Why Gra*fast*?

Gra*fast* extends GraphQL's **declarative** aesthetic to execution, taking a
holistic approach to understanding the incoming request via planning. Batching
is baked in so you never need to think about the N+1 problem again, simple
optimizations such as field selection and work deduplication require almost no
effort, and more advanced optimizations are now achievable without herculean
effort.

### Extremely efficient

Plan resolvers ergonomically describe each field's requirements. Gra*fast* walks
the document and assembles these requirements into a draft execution plan, which
it then optimizes before execution. With Gra*fast*, you can:

- **eliminate under-fetching** by eager loading when it makes sense, reducing
  the round-trips required to your backend data stores;
- **eliminate over-fetching** by only requesting what's needed from your
  business logic;
- **streamline** data pipelines by eliminating redundant work;
- **eliminate the N+1 problem** by design, since Gra*fast* batch processes all
  values via a single `execute()` method[^1]; and
- **reduce Promise usage by orders of magnitude** since, unlike [DataLoader][],
  Gra*fast*'s batching is built-in and does not need a promise for each
  individual item load.

Because planning understands the entire request, optimisations apply
operation-wide rather than being sprinkled field-by-field in resolvers. Plan
resolvers can be written ergonomically in terms of data flow without needing to
think about optimization patterns, which are handled at a broader level.

[^1]:
    You can opt out of batch processing on a per-step basis (like `lambda()`
    and `sideEffect()` do), but Gra*fast*'s model is built around batching.
    Unbatched execution is an optimization for certain shapes of logic (typically
    transforms) - it's the exception rather than the rule.

### Spec compliant

The GraphQL specification
[notes](https://spec.graphql.org/draft/#sec-Conforming-Algorithms):

> _Conformance requirements [...] can be fulfilled [...] in any way as long as
> the perceived result is equivalent._  
> ─ https://spec.graphql.org/draft/#sec-Conforming-Algorithms

Gra*fast* has been written very careful by a [GraphQL Technical Steering
Committee](https://github.com/graphql/graphql-wg/blob/main/GraphQL-TSC.md#tsc-members)
member to ensure that the perceived result is equivalent; thus, despite its
drastically different execution algorithm, it is 100% spec compliant.

### Compatible with (most) resolvers

Gra*fast* implements resolvers emulation, enabling the vast majority of
GraphQL.js schema resolvers to be executed via <Grafast /> directly. Doing so
will not reap the benefits of planning, but it does go to show that everything
that can be done in a resolver can be done in a plan (since in <Grafast />,
resolvers are emulated via plans themselves!).

Bring your existing schema, and port it to plans on a field-by-field basis!

### Arbitrary data-sources

Gra*fast* is not tied to any particular storage or business logic layer — any
valid GraphQL schema could be implemented with <Grafast />, and a <Grafast />
schema can query any data source, service, or business logic that Node.js can
query. We do have highly optimized steps available for particular data stores,
but you can reap huge benefits from just switching from using DataLoader in
resolvers to using [`loadOne()`](./standard-steps/loadOne.md) and
[`loadMany()`](./standard-steps/loadMany.md) in plan resolvers
&mdash; and they can even use the same callback!

## Request lifecycle

Gra*fast* expects the incoming document to be parsed and validated with
`graphql-js` before planning; passing an unvalidated document may lead to
unexpected behaviour. Gra*fast* only replaces execution, and can be used as a
substitute for the GraphQL `execute` and `subscribe` methods in many servers.

### Reusing an operation plan

Once a validated operation arrives, Gra*fast* looks for an existing **operation
plan** in its cache (keyed by schema, document, and operation name[^2]) and uses
it for execution if found.

[^2]:
    Currently, plan reuse also considers some additional constraints to
    support the `@skip` and `@include` directives, but these constraints are being
    phased out.

### Establishing a new operation plan

The time during which a new operation plan is being established is called
"plan-time".

To establish an operation plan for a never-seen-before operation, Gra*fast*
walks the document in a breadth-first manner and calls the relevant **plan
resolver** for each field, argument, and abstract type that it finds. A field'
plan resolver may construct 0 or more steps and must return eactly one **step**
suitable to produce its desired output.

The steps from all of these plan resolvers are combined to form the **execution
plan**, a directed acyclic graph (DAG) that details the flow of information
during execution, and an **output plan** which details how to turn the result of
this graph into a valid GraphQL response.

The execution plan is optimized via principled communication with and between
the various steps therein: deduplicating redundant work, fusing related steps
(e.g. joins and subqueries in a database, additional "includes" in REST APIs,
or similar forms of eager-loading), creating optimized data flows, and
ultimately building the most optimal plan for execution that it can.

### Execution

Once the operation plan for a request has been established, the execution plan
is executed, and formatted into the GraphQL response via the output plan.

The time during which an established operation plan is being executed is called
"execution-time".

:::info[Thinking in plans]

Ready for a deeper dive into how the data flows between steps? Continue with
[Thinking in plans](./flow.mdx).

:::

## Plan resolvers

_This is just an overview, for full documentation see
[Plan Resolvers](./plan-resolvers)._

Though traditional resolvers are supported via emulation, you are encouraged to
use native [**plan&nbsp;resolvers**](/grafast/plan-resolvers).

Plan resolvers are small functions that are called at plan-time to produce
**steps** (the building blocks of an [**execution
plan**](/grafast/operation-plan#execution-plan)) to detail actions sufficient to
produce the value for this field. The execution plan is the combination of all
of these steps, and details actions sufficient to satisfy a GraphQL request.

Imagine that we have this GraphQL schema:

```graphql
type Query {
  currentUser: User
}
type User {
  name: String!
  friends: [User!]!
}
```

In [graphql-js][], you might have these resolvers:

```ts
const resolvers = {
  Query: {
    async currentUser(_, args, context) {
      return context.userLoader.load(context.currentUserId);
    },
  },
  User: {
    name(user) {
      return user.full_name;
    },
    async friends(user, args, context) {
      const friendships = await context.friendshipsByUserIdLoader.load(user.id);
      const friends = await Promise.all(
        friendships.map((friendship) =>
          context.userLoader.load(friendship.friend_id),
        ),
      );
      return friends;
    },
  },
};
```

In <Grafast />, we use [**plan resolvers**](/grafast/plan-resolvers) instead,
which might look something like:

```ts
const planResolvers = {
  Query: {
    currentUser() {
      return userById(context().get("currentUserId"));
    },
  },
  User: {
    name($user) {
      return $user.get("full_name");
    },
    friends($user) {
      const $friendships = friendshipsByUserId($user.get("id"));
      const $friends = each($friendships, ($friendship) =>
        userById($friendship.get("friend_id")),
      );
      return $friends;
    },
  },
};
```

As you can see, the shape of the logic is quite similar, but the <Grafast />
plan resolvers are synchronous. <Grafast /> operates in two phases: planning
(synchronous) and execution (asynchronous); plan resolvers are called during
the planning phase (plan-time).

:::info See the working example

If you want to explore the two code blocks above, and see them in context
including their dependencies, please see the ["users and friends"
example](https://github.com/graphile/crystal/tree/main/grafast/website/examples/users-and-friends).

:::

The job of a plan resolver is not to retrieve data, it's to detail the **steps**
necessary to retrieve it. Plan resolvers do not have access to any request data,
they must describe what to do for arbitrary future data. For example, the
`User.friends` Gra*fast* plan resolver cannot loop through the data with a `map`
function as in the resolver example (since there is not yet any data to loop
over), instead it describes the plan to do so using an [`each`
step](/grafast/standard-steps/each), detailing what to do with each
item that will be seen at execution-time.

:::tip The dollar convention

By convention, when a variable represents a <Grafast /> step, the variable will
be named starting with a `$` (dollar symbol). This helps to indicate that the
variable will never "resolve" to an execution-time value, but instead represents
a unit of execution in the overall plan.

:::

## Steps

Steps are the basic building blocks of a <Grafast /> plan; they are instances
of a step class, constructed via the function calls in the plan resolver. Step
classes describe how to perform a specific action and help plan how to perform
the action more efficiently via the **lifecycle methods**. <Grafast /> provides
optimized built–in steps for common needs; it's common that you can get started
using just these, but as you go about optimizing your schema further it's
expected that you will build your own step classes, in the same way that you'd
build DataLoaders in a resolver–based GraphQL API.

If we were to make a request to the above <Grafast /> schema with the following
query:

```graphql
{
  currentUser {
    name
    friends {
      name
    }
  }
}
```

&ZeroWidthSpace;<Grafast /> would build an [**operation
plan**](/grafast/operation-plan) for the operation. For the above query, a
[**plan diagram**](/grafast/plan-diagrams) representing the execution portion
of this operation plan is:

<Mermaid chart={mermaidPlan} />

Each node in this diagram represents a **step** in the operation plan, and the
arrows show how the data flows between these steps.

:::tip Plans can be reused for multiple requests

When the same operation is seen again its existing plan can (generally) be
reused; this is why, to get the very best performance from <Grafast />, you
should use static GraphQL documents and pass variables at run–time.

:::

### Batched execution

The main concern of most steps is execution. In <Grafast /> all execution is
batched, so each of the nodes in the operation plan will execute at most once
during a GraphQL query or mutation. This is one of the major differences when
compared to traditional GraphQL execution; with traditional resolvers
processing happens in a layer–by–layer, item–by–item approach, requiring
workarounds such as `DataLoader` to help reduce instances of the N+1 problem.

When it comes time to execute an operation plan, <Grafast /> will automatically
populate the steps whose names begin with `__` (e.g. the context and variable
values) and then will begin the process of executing each step
once all of its dependencies are ready, continuing until all steps are
complete.

At planning time a step can add a dependency on another step via `const depId =
this.addDependency($otherStep);`. This `depId` is the index in the **values
tuple** that the step can use at execution time to retrieve the associated
values.

When a step executes, its `execute` method is passed the **execution
details** which includes:

- `count` — the size of the batch to be executed
- `values` — the **values tuple**, the values for each of the dependencies the
  step added
- `indexMap(callback)` — method returning an array by calling `callback(i)` for
  each index `i` in the batch (from `0` to `count-1`)

The `execute` method must return a list (or a promise to a list) of length
`count`, where each entry in this list relates to the corresponding entries in
`values` — this should be at least a little familiar to anyone who has written
a DataLoader before.

When a plan starts executing it always starts with a batch size (`count`) of 1;
but many things may affect this batch size for later steps — for example when
processing the items in a list, the batch must grow to contain each item (via
the `__Item` step). <Grafast /> handles all of these complexities for you
internally, so you don't generally need to think about them.

#### Unary steps

A "unary step" is a regular step which the system has determined will always
represent exactly one value. The system steps which represent request–level
data (e.g. context, variable and argument values) are always unary steps,
and&nbsp;<Grafast /> will automatically determine which other steps are also
unary steps.

Sometimes you'll want to ensure that one or more of the steps your step class
depends on will have exactly one value at runtime; to do so, you can use
`this.addUnaryDependency($step)` rather than `this.addDependency($step)`.
This
ensures that the given dependency will always be a unary step, and is primarily
useful when a parameter to a remote service request needs to be the same for
all entries in the batch; typically this will be the case for ordering,
pagination and access control. For example if you're retrieving the first N
pets from each of your friends you might want to add `limit N` to an SQL query
— by adding the N as a unary dependency you can guarantee that there will be
exactly one value of N for each execution, and can construct the SQL query
accordingly (see `limitSQL` in the example below).

#### SQL example

Here's a step class which retrieves records matching a given column (i.e.
`WHERE columnName = $columnValue`) from a given table in an SQL database.
Optionally, you may request to limit to the first `$first` results.

```ts
export class RecordsByColumnStep extends Step {
  constructor(tableName, columnName, $columnValue) {
    super();
    this.tableName = tableName;
    this.columnName = columnName;
    this.columnValueDepIdx = this.addDependency($columnValue);
  }

  setFirst($first) {
    this.firstDepId = this.addUnaryDependency($first);
  }

  async execute({ indexMap, values }) {
    // Retrieve the values for the `$columnValue` dependency
    const columnValueDep = values[this.columnValueDepIdx];

    // We may or may not have added a `$first` limit:
    const firstDep =
      this.firstDepId !== undefined ? values[this.firstDepId] : undefined;

    // firstDep, if it exists, is definitely a unary dep (!firstDep.isBatch), so
    // we can retrieve its value directly:
    const first = firstDep ? parseInt(firstDep.value, 10) : null;

    // Create a `LIMIT` clause in our SQL if the user specified a `$first` limit:
    const limitSQL = Number.isFinite(first) ? `limit ${first}` : ``;

    // Create placeholders for each entry in our batch in the SQL:
    const placeholders = indexMap(() => "?");
    // The value from `$columnValue` for each index `i` in the batch
    const columnValues = indexMap((i) => columnValueDep.at(i));

    // Build the SQL query to execute:
    const sql = `\
      select *
      from ${this.tableName}
      where ${this.columnName} in (${placeholders.join(", ")})
      ${limitSQL}
    `;

    // Execute the SQL query once for all values in the batch:
    const rows = await executeSQL(sql, columnValues);

    // Figure out which rows relate to which batched inputs:
    return indexMap((i) =>
      rows.filter((row) => row[this.columnName] === columnValues[i]),
    );
  }
}

function petsByOwnerId($ownerId) {
  return new RecordsByColumnStep("pets", "owner_id", $ownerId);
}
```

Notice that there's only a single `await` call in this step's execute method,
and we already know the step is only executed once per request; compare
this single asynchronous action with the number of promises that would need
to be created were you to use `DataLoader` instead.

:::info Not just databases!

The `execute` method is just JavaScript; it can
talk to absolutely any data source that Node.js itself can talk to. Though the
example shows SQL you could replace the `executeSQL()` call with `fetch()` or
any other arbitrary JavaScript function to achieve your goals.

:::

:::note Simplified example

The code above was written to be a simple example; though it works ([see full
solution using
it](https://github.com/graphile/crystal/blob/main/grafast/website/grafast/index.example.mjs)),
it's not nearly as good as it could be — for example it does not track the
columns accessed so that only these columns are retrieved, nor does it use
lifecycle methods to determine more optimal ways of executing.

(Another thing: it passes the `tableName` and `columnName` values directly into
SQL — it would be safer to use an `escapeIdentifier()` call around these.)

:::

### Step lifecycle

The [**execution plan**](/grafast/operation-plan#execution-plan) diagram you saw
above is the final form of the plan, there may have been many intermediate
states that it went through in order to reach this most optimal form, made
possible by <Grafast />'s lifecycle methods.

:::info Finding more information

This is an overview, for full documentation see [lifecycle][lifecycle].

For more information about understanding plan diagrams please see
[Plan Diagrams](/grafast/plan-diagrams).

For a fully working implementation of the above schema, please see the
["users and friends" example](https://github.com/graphile/crystal/tree/main/grafast/website/examples/users-and-friends).

:::

All plan lifecycle methods are optional, and due to the always–batched nature
of <Grafast /> plans you can get good performance without using any of them
(performance generally on a par with reliable usage of DataLoader). However, if
you leverage lifecycle methods your performance can go from "good" to
:sparkles:**_amazing_**:rocket:.

One of the great things about <Grafast />'s design is that you don't need to
build these optimizations from the start; you can implement them at a later
stage, making your schema faster without requiring changes to your business
logic _or_ your plan resolvers!

As a very approximate overview:

- once a field is planned we **deduplicate** each new step
- once the execution plan is complete, we **optimize** each step
- finally, we **finalize** each step

### Deduplicate

**Deduplicate** lets a step indicate which of its peers (defined by <Grafast
/>) are equivalent to it. One of these peers can then, if possible, replace the
new step, thereby reducing the number of steps in the plan (and allowing more
optimal code paths deeper in the plan tree).

### Optimize

**Optimize** serves two purposes.

Purpose one is that optimize lets a step "talk" to its ancestors, typically to
tell them about data that will be needed so that they may fetch it proactively.
This should not change the observed behavior of the ancestor (e.g. you should
not use it to apply filters to an ancestor — this may contradict the GraphQL
specification!) but it can be used to ask the ancestor to fetch additional
data.

The second purpose is that optimize can be used to replace the step being
optimized with an alternative (presumably more–optimal) step. This may result
in multiple steps being dropped from the plan graph due to "tree shaking." This
might be used when the step has told an ancestor to fetch additional data and
the step can then replace itself with a simple "access" step. It can also be
used to dispose of plan–only steps that have meaning at planning time but have
no execution–time behaviors.

In the "friends" example above, this was used to change the DataLoader–style
`select * from ...` query to a more optimal `select id, full_name from ...`
query. In more advanced plans (for example those made available through
[@dataplan/pg][]), optimize can go much further, for example inlining its data
requirements into a parent and replacing itself with a simple "remap keys"
function.

### Finalize

**Finalize** is the final method called on a step, it gives the step a chance to
do anything that it would generally only need to do once; for example a step
that issues a GraphQL query to a remote server might take this opportunity to
build the GraphQL query string once. A step that converts a tuple into an
object might build an optimized function to do so.

## Further optimizations

&ZeroWidthSpace;<Grafast /> doesn't just help your schema to execute fewer and more efficient
steps, it also optimizes how your data is output once it has been determined.
This means that even without making a single change to your existing GraphQL
schema (i.e. without adopting plans), running it though <Grafast /> rather than
graphql-js should result in a modest speedup, especially if you need to output
your result as a string (e.g. over a network socket/HTTP).

## Convinced?

If you're not convinced, please do reach out via the [Graphile Discord][] with
your queries, we'd love to make improvements to both this page, and <Grafast />
itself!

If you are convinced, why not continue on with the navigation button below...

:::info[Not in the JS world?]

Currently <Grafast /> is implemented in TypeScript, but we're working on a
specification with hopes to extend <Grafast />'s execution approach to other
programming languages. If you're interested in implementing <Grafast />'s
execution algorithm in a language other than JavaScript, please get in touch!

:::

[graphql-js]: https://github.com/graphql/graphql-js
[dataloader]: https://github.com/graphql/dataloader
[graphile discord]: https://discord.gg/graphile
[@dataplan/pg]: ./step-library/dataplan-pg
[lifecycle]: ./step-classes#lifecycle-methods
